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In this article, we design new turbo codes that can achieve near- Shannon-limit 
performance. The design criterion for random interleavers is based on maximizing 
the effective free distance of the turbo code, i.e., the minimum output weight of 
codewords due to weight-2 input sequences. An upper bound on the effective free 
distance of a turbo code is derived. This upper bound can be achieved if the feedback 
connection of convolutional codes uses primitive polynomials. We review multiple 
turbo codes (parallel concatenation of q convolutional codes), which increase the 
so-called “interleaving gain ” as q and the interleaver size increase, and a suitable 
decoder structure derived from an approximation to the maximum a posteriori 
probability decision rule. We develop new rate 1/3, 2/3, 3/4, and 4/5 constituent 
codes to be used in the turbo encoder structure. These codes, for from 2 to 32 
states, are designed by using primitive polynomials. The resulting turbo codes have 
rates b/n, b=Z, 2, 3, 4, and n=2, 3, 4, 5, 6 and include random interleavers for better 
asymptotic performance. These codes are suitable for deep-space communications 
with low throughput and for near-Earth communications where high throughput is 
desirable. The performance of these codes is within 1 dB of the Shannon limit at a 
bit-error rate of 10~ 6 for throughputs from 1/15 up to 4 bits/s/Hz. 


I. Introduction 

Coding theorists have traditionally attacked the problem of designing good codes by developing codes 
with a lot of structure, which lends itself to feasible decoders, although coding theory suggests that 
codes chosen “at random” should perform well if their block sizes are large enough. The challenge 
to find practical decoders for “almost” random, large codes has not been seriously considered until 
recently. Perhaps the most exciting and potentially important development in coding theory in recent 
years has been the dramatic announcement of “turbo codes” by Berrou et al. in 1993 [7]. The announced 
performance of these codes was so good that the initial reaction of the coding establishment was deep 
skepticism, but recently researchers around the world have been able to reproduce those results [15,19,8]. 
The introduction of turbo codes has opened a whole new way of looking at the problem of constructing 
good codes [5] and decoding them with low complexity [7,2]. 

Turbo codes achieve near-Shannon-limit error correction performance with relatively simple component 
codes and large interleavers. A required E^/Nq of 0.7 dB was reported for a bit-error rate (BER) of 10~ 5 
for a rate 1/2 turbo code [7]. Multiple turbo codes (parallel concatenation of q > 2 convolutional codes) 
and a suitable decoder structure derived from an approximation to the maximum a posteriori (MAP) 
probability decision rule were reported in [9]. In [9], we explained for the first time the turbo decoding 
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scheme for multiple codes and its relation to the optimum bit decision rule, and we found rate 1/4 turbo 
codes whose performance is within 0.8 dB of Shannon’s limit at BER=10~ 5 . 

In this article, we (1) design the best component codes for turbo codes of various rates by maximizing 
the “effective free distance of the turbo code,” i.e., the minimum output weight of codewords due to 
weight-2 input sequences; (2) describe a suitable trellis termination rule for b/n codes; (3) design low 
throughput turbo codes for power-limited channels (deep-space communications); and (4) design high- 
throughput turbo trellis-coded modulation for bandwidth-limited channels (near-Earth communications). 


II. Parallel Concatenation of Convolutional Codes 

The codes considered in this article consist of the parallel concatenation of multiple (q > 2) con- 
volutional codes with random interleavers (permutations) at the input of each encoder. This extends 
the original results on turbo codes reported in [7], which considered turbo codes formed from just two 
constituent codes and an overall rate of 1/2. 

Figure 1 provides an example of parallel concatenation of three convolutional codes. The encoder 
contains three recursive binary convolutional encoders with mi, m2, and m3 memory cells, respectively. 
In general, the three component encoders may be different and may even have different rates. The first 
component encoder operates directly (or through 7Ti ) on the information bit sequence u = («i,- * * ,un) 
of length TV, producing the two output sequences xo and xi. The second component encoder operates 
on a reordered sequence of information bits, u 2 , produced by a permuter (interleaver), 7T 2 , of length TV, 
and outputs the sequence x 2 . Similarly, subsequent component encoders operate on a reordered sequence 
of information bits. The interleaver is a pseudorandom block scrambler defined by a permutation of TV 
elements without repetitions: A complete block is read into the the interleaver and read out in a specified 
(fixed) random order. The same interleaver is used repeatedly for all subsequent blocks. 

Figure 1 shows an example where a rate r = 1/n = 1/4 code is generated by three component codes 
with memory 7 U\ = m2 = ?n 3 = m = 2, producing the outputs x 0 = u, xj = u gi/go, x 2 = u 2 • tfi/tfcu 
and x 3 = u 3 ■ gi/go (here 71*1 is assumed to be an identity, i.e., no permutation), where the generator 
polynomials go and gi have octal representation (7 ) oc t a i and (5) oc t a /, respectively. Note that various code 
rates can be obtained by proper puncturing of xi, x 2 , x 3 , and even x 0 (for an example, see Section V). 

We use the encoder in Fig. 1 to generate an (n(N + m), TV) block code, where the m tail bits of code 2 
and code 3 are not transmitted. Since the component encoders are recursive, it is not sufficient to set 
the last m information bits to zero in order to drive the encoder to the all-zero state, i.e., to terminate 
the trellis. The termination (tail) sequence depends 011 the state of each component encoder after TV 
bits, which makes it impossible to terminate all component encoders with rn predetermined tail bits. 
This issue, which had not been resolved in the original turbo code implementation, can be dealt with 
by applying a simple method described in [8] that is valid for any number of component codes. A more 
complicated method is described in [18]. 

A design for constituent convolutional codes, which are not necessarily optimum convolutional codes, 
was originally reported in [5] for rate 1/n codes. In this article, we extend those results to rate b/n 
codes. It was suggested (without proof) in [2] that good random codes are obtained if g (l is a primitive 
polynomial. This suggestion, used in [5] to obtain “good” rate 1/2 constituent codes, will be used in this 
article to obtain “good” rate 1/3, 2/3, 3/4, and 4/5 constituent codes. By “good” codes we mean codes 
with a maximum effective free distance d e j , those codes that maximize the minimum output weight for 
weight-2 input sequences, as discussed in [9], [13], and [5] (because this weight tends to dominate the 
performance characteristics over the region of interest). 
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III. Design of Constituent Encoders 

As discussed in the previous section, maximizing the weight of output codewords corresponding to 
weight-2 data sequences gives the best BER performance for a moderate bit signal-to-noise ratio (SNR) 
as the random interleaver size N gets large. In this region, the dominant term in the expression for bit 
error probability of a turbo code with q constituent encoders is 






2r t(g^ + 2 


where d p 2 is the minimum parity-weight (weight due to parity checks only) of the codewords at the 
output of the jth constituent code due to weight-2 data sequences, and 0 is a constant independent of 
N. Define dj ^ = d p 2 + 2 as the minimum output weight including parity and information bits, if the j th 
constituent code transmits the information (systematic) bits. Usually one constituent code transmits the 
information bits (j ~ 1), and the information bits of others are punctured. Define d e f = + 2 38 

the effective free distance of the turbo code and 1 /N q ~ l as the “interleaver’s gain.” We have the following 
bound on d 2 for any constituent code. 

Theorem 1 . For any r = b/(b 4- 1) recursive systematic convolutional encoder with generator matrix 


hi(D) - 

ho(D) 

h*{D) 

h*{D) 


h b (D ) 

ho(D ) . 
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where Ibxb is a 6 x b identity matrix, deg[/ii(£»)] < m, hi(D) ± h 0 (D), i = 1,2, •••,6, and h 0 (D) is a 
primitive polynomial of degree m, the following upper bound holds: 


om— 1 

<*5<l— J+2 


Proof. In the state diagram of any recursive systematic convolutional encoder with generator matrix 
G , there exist at least two nonoverlapping loops corresponding to all-zero input sequences. If ho(D) is a 
primitive polynomial, there are two loops: one corresponding to zero-input, zero-output sequences with 
branch length one, and the other corresponding to zero-input but nonzero-output sequences with branch 
length 2 m — 1, which is the period of maximal length (ML) linear feedback shift registers (LFSRs) [14] 
with degree m. The parity codeword weight of this loop is 2 m_1 , due to the balance property [14] of ML 
sequences. This weight depends only on the degree of the primitive polynomial and is independent of 
h t {D), due to the invariance to initial conditions of ML LFSR sequences. In general, the output of the 
encoder is a linear function of its input and current state. So, for any output we may consider, provided 
it depends on at least one component of the state and it is not ho(D), the weight of a zero-input loop is 
2 m “ 1 , by the shift-and-add property of ML LFSRs. 


A 



Consider the canonical representation of a rate (b + l)/b encoder [20] as shown in Fig. 2 when the 
switch is in position A. Let S k (D) be the state of the encoder at time k with coefficients , Sf, • ■ ■ , , 

where the output of the encoder at time k is 


x = s^_\ + Y y ^h t 


1 


(i) 


The state transition for input u k , • • • , u k at time k is given by 


S k {D) 


■ b 

Y, u i^(D) + DS k - l (D) 

.1=1 


mod h 0 (D) 


(2) 


From the all-zero state, we can enter the zero-input loop with nonzero input symbols tii, • • • ,tq, at state 
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( 3 ) 


b 

S l (D) = ' s ^2 / Uihi{D) mod ho(D) 

i=i 


From the^ same nonzero input symbol, we leave exactly at state S rn ~ 1 (D) back to the all-zero state, 
where S 2 > ~ 1 (D ) satisfies 


S l (D) = DS 2 ' n - l (D) mod h 0 (D) (4) 

i.e., S 2 ~ 1 (D) is the “predecessor” to state S 1 (D) in the zero-input loop. If the most significant bit of 
the predecessor state is zero, i.e., = 0, then the branch output for the transition from S 2 ' ~ l {D) 

to S l (D) is zero for a zero-input symbol. Now consider any weight-1 input symbol, i.e., Uj = 1 for j = i 
and Uj = 0 for j ^ z, j = 1, 2, ■ ■ • , b. The question is: What are the conditions on the coefficients h t (D) 
such that, if we enter with a weight-1 input symbol into the zero-input loop at state S X (D), the most 
significant bit of the “predecessor” state S r "~ l (D) is zero. Using Eqs. (3) and (4), we can establish that 

hio T hi m = 0 (5) 

Obviously, when we enter the zero-input loop from the all-zero state and when we leave this loop to go 
back to the all-zero state, we would like the parity output to be equal to 1. From Eqs. (1) and (5), we 
require 


hiQ — 1 
hi^rn ~ I 


> 


(6) 


With this condition, we can enter the zero-input loop with a weight-1 symbol at state S l (D) and then 
leave this loop from state 5 2 ~ 1 {D) back to the all-zero state, for the same weight-1 input. The parity 
w r eight of the codeword corresponding to weight-2 data sequences is then 2 m_1 -f 2, where the first term 
is the weight of the zero-input loop and the second term is due to the parity bit appearing when entering 
and leaving the loop. If b = 1, the proof is complete, and the condition to achieve the upper bound is 
given by Eq. (6). For b ~ 2, we may enter the zero-input loop with u = 10 at state S l (D) and leave the 
loop to the zero state with u = 01 at some state S j (D). If we can rhoose S j (D) such that the output 
weight of the zero-input loop from S l (D) to S J (D ) is exactly 2 m-1 /2 , then the output weight of the 
zero-input loop from S j+1 (D) to 5 2 ~ 1 (D) is exactly 2 m_1 /2, and the minimum weight of codewords 
corresponding to some weight-2 data sequences is 

2 m “ 1 


In general, for any 6, if we extend the procedure for 6 = 2, the minimum weight of the codewords 
corresponding to weight-2 data sequences is 


O 771— 1 

L— J + 2 (7) 

where [xj is the largest integer less than or equal to x. Clearly, this is the best achievable weight for the 
minimum-weight codeword corresponding to weight-2 data sequences. This upper bound can be achieved 
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if the maximum run length of l’s (m) in the zero-input loop does not exceed |_2 m 1 jb\. If m> [2 171 1 /6J , 
then the minimum weight of the codewords corresponding to weight-2 data sequences will be strictly less 
than [2 m - l /b\ +2. 

The run property of ML LFSRs [14] can help us in designing codes achieving this upper bound. 
Consider only runs of l’s with length l for 0 < / < m - 1; then there are 2 m ~ 2_/ runs of length /, no runs 
of length m - 1, and only one run of length m. ^ 

Corollary 1 . For any r — b/n recursive systematic convolutional code with b inputs, b systematic 
outputs, and n — b parity output bits using a primitive feedback generator, we have 


^2 - L 


(n - 6)2 


m — 1 


-J + 2(n - 6) 


(8) 


Proof. The total output weight of a zero-input loop due to parity bits is ( n - b)2 hl ~ 1 . In this zero- 
input loop, the largest minimum weight (due to parity bits) for entering and leaving the loop with any 
weight-1 input symbol is [(n — 6)2^ _1 ]/6. The output weight due to parity bits for entering and leaving 
the zero-input loop (both into and from the all-zero state) is 2(n - 6). & 

There is an advantage to using 6 > 1, since the bound in Eq. (8) for rate b/bn codes is larger than the 
bound for rate 1 fn codes. Examples of codes are found that meet the upper bound for b/bn codes. 

A. Best Rate b/b + 1 Constituent Codes 

We obtained the best rate 2/3 codes as shown in Table 1, where <^2 = ^2 + -• The minimum-weight 
codewords corresponding to weight-3 data sequences are denoted by ^3, is the minimum distance 

of the code, and k = m + 1 in all the tables. By “best” we mean only codes with a large (I 2 for a given 
m that result in a maximum effective free distance. We obtained the best rate 3/4 codes as shown in 
Table 2 and the best rate 4/5 codes as shown in Table 3. 


Table 1. Best rate 2/3 constituent codes. 


k 

Code generator 

d 2 

dz 

dniin 

3 

t- 

II 

0 

hi = 3 

h 2 = 5 

4 

3 

3 

4 

CO 

II 

0 

hi = 15 

h 2 = 17 

5 

4 

4 

5 

3- 

0 

1! 

to 

CO 

hi = 35 

h‘2 = 27 

8 

5 

5 


h 0 = 23 

hi = 35 

h 2 = 33 

8 

5 

5 

0 

h a = 45 

hi = 43 

h 2 = 61 

12 

6 

6 


Table 2. Best rate 3/4 constituent codes. 


k Code generator d 2 dz dmin 


ho - 7 

hi 

= 5 

h'2 = 3 

h 3 = 1 

3 

3 

3 

0 

II 

hi 

= 5 

h 2 = 3 

h 3 = 4 

3 

3 

3 

h 0 = 7 

hi 

Oft 5 

h 2 3 

hz = 2 

3 

3 

3 

h 0 = 13 

hi 

m 15 

h 2 = 17 

h 3 = 11 

4 

4 

4 

ho - 23 

hi 

= 35 

h 2 - 33 

hz = 25 

5 

4 

4 

h 0 = 23 

hi 

= 35 

r- 

CN 

II 

C4 

hz = 31 

5 

4 

4 

h 0 - 23 

hi 

= 35 

h 2 m 37 

hz - 21 

5 

4 

4 

h 0 = 23 

hi 

= 27 

h 2 = 37 

hz = 21 

5 

4 

4 
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Table 3. Best rate 4/5 constituent codes. 


k 


Code generator 


d-2 

d,3 

d’min 

4 

h 0 = 13 

hi = 15 

h 2 = 17 

h 3 = 11 

h 4 = 7 

4 

3 

3 


h 0 = 13 

hi = 15 

h 2 = 17 

h 3 = 11 

/14 = 5 

4 

3 

3 

5 

ho = 23 

hi = 35 

h 2 = 33 

h 3 = 37 

h 4 SB 31 

5 

4 

4 


h 0 = 23 

hi = 35 

h 2 = 27 

/i 3 = 37 

h 4 = 31 

5 

4 

4 


CO 

CN 

II 

O 

hi = 35 

h 2 = 21 

h 3 = 37 

/14 = 31 

5 

4 

4 



B. Trellis Termination for b/n Codes 

Trellis termination is performed (for b = 2, as an example) by setting the switch shown in Fig. 2 
in position B. The tap coefficients a^o, • * * , ai >m _ i for i = 1,2, ■ • ■ ,6 can be obtained by repeated use of 
Eq. (2) and by solving the resulting equations. The trellis can be terminated in state zero with at least 
m/b and at most m clock cycles. When Fig. 3 is extended to multiple input bits (b parallel feedback shift 
registers), a switch should be used for each input bit. 

C. Best Punctured Rate 1/2 Constituent Codes 

A rate 2/3 constituent code can be derived by puncturing the parity bit of a rate 1/2 recursive 
systematic convolutional code using, for example, a pattern P = [10]. A puncturing pattern P has zeros 
where parity bits are removed. 

Consider a rate 1/2 recursive systematic convolutional code (1, gi(D)/(g 0 (D)). For an input u(D), 
the parity output can be obtained as 


x(D) = 


niPMD) 

9o(D) 


( 9 ) 


We would like to puncture the output x(D) using, for example, the puncturing pattern P[10] (decimation 
by 2) and obtain the generator polynomials ho(D), hi(D), and / 12 (D) for the equivalent rate 2/3 code: 


1 0 


G = 


0 1 


MO)' 

/i 0 (D) 

/12(D) 

MD) . 
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We note that any polynomial f{D) = Yl a iD l , at e GF( 2), can be written as 


f(D) = h(D 2 ) + Df 2 (D 2 ) 


(10) 


where f\{D 2 ) corresponds to the even power terms of f(D), and Df 2 (D 2 ) corresponds to the odd power 
terms of f{D). Now, if we use this approach and apply it to the u(D), g\(D ), and go{D), then we can 
rewrite Eq. (9) as 


*1 (£> 2 ) + Dx 2 ( D 2 ) 


(«i {D 2 ) + Du 2 ( D 2 )) (gn {D 2 ) + D<? 12 (D 2 )) 
g<n(D*) + Dg 02 (D*) 


( 11 ) 


where x\ (D) and x 2 (D) correspond to the punctured output x(D) using puncturing patterns P[10] and 
P[01], respectively. If we multiply both sides of Eq. (11) by (goi(D 2 ) + P</o2(P 2 )) and equate the even 
and the odd power terms, we obtain two equations in two unknowns, namely X\(D) and x 2 (D). For 
example, solving for x\ (D), we obtain 


xi(D) = ui(D) 


hx(D) 

h 0 (D) 


+ u 2 (D) 


h 2 (P) 

ho(D) 


where ho(D) = go{D) and 


h\(D) = gn(D)goi(D) + Dg\ 2 (D)g{) 2 {D) 
h 2 (D) = Dg\ 2 (D)goi(D) 4- Dgn(D)go 2 (D) J 


(12) 


(13) 


From the second equation in Eq. (13), it is clear that /i 2 ,o = 0. A similar method can be used to show 
that for P [01] we get hi >rn = 0. These imply that the condition of Eq. (6) will be violated. Thus, wc have 
the following theorem. 

Theorem 2. If the parity puncturing pattern is P = [10] or P = [01], then it is impossible to achieve 
the upper bound on d 2 = d 1 ^ + 2 for rate 2/3 codes derived by puncturing rate 1/2 codes. 

The best rate 1 /2 constituent codes with puncturing pattern P = [10] that achieve the largest d 2 are 
given in Table 4. 


Table 4. Best rate 1/2 punctured 
constituent codes. 


k Code generator d 2 dmiii 


3 

go = 7 

91 

= 5 

4 

3 

3 

4 

go = 13 

9i 

= 15 

5 

4 

4 

5 

go = 23 

91 

- 37 

7 

4 

4 


g 0 = 23 

9i 

= 31 

7 

4 

4 


90 « 23 

91 

= 33 

6 

5 

5 


9o = 23 

9i 

= 35 

6 

4 

4 


9o = 23 

91 

m 27 

6 

4 

4 
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D. Best Rate 1/n Constituent Codes 


For rate 1/n codes, the upper bound in Eq. (7) for b = 1 reduces to 


d p 2 < (n - l)(2 m-1 +2) 


This upper bound was originally derived in [5], where the best rate 1/2 constituent codes meeting the 
bound were obtained. Here we present a simple proof based on our previous general result on rate b/n 
codes. Then we obtain the best rate 1 /3 and 1 /4 codes. 

Theorem 3. For rate 1/n recursive systematic convolutional codes with primitive feedback, we have 


d% < (n- l)(2 m_1 + 2) 


Proof. Consider a rate 1/n code, shown in Fig. 3. In this figure, go{D) is assumed to be a primitive 
polynomial. As discussed above, the output weight of the zero-input loop for parity bits is 2 771-1 inde- 
pendent of the choice of gi(D), i = 1, 2, • ■ • , n - 1, provided that g l {D) ^ 0 and that gt{D) j=- g 0 (D), by 
the shift-and-add and balance properties of ML LFSRs. If S(D) represents the state polynomial, then 
we can enter the zero-input loop only at state S l (D) = 1 and leave the loop to the all-zero state at state 
S 2 -1 (D) = D m ~ l . The ith parity output on the transition S 2 ’" ~ 1 (D) — ► S l (D) with a zero input bit is 


— 9i0 9i,m 


If 9 i 0 = 1 and g^ m = 1 for i = 1, • ■ • , n - 1, the output weight of the encoder for that transition is zero. 
The output weight due to the parity bits when entering and leaving the zero-input loop is (n — 1) for 
each case. In addition, the output weight of the zero-input loop will be (n - l)2 m_1 for (n - 1) parity 
bits. Thus, we established the upper bound on for rate 1/n codes. □ 


We obtained the best rate 1/3 and 1/4 codes without parity repetition, as shown in Tables 5 and 6, 
where d 2 = d% + 2 represents the minimum output weight given by weight-2 data sequences. The best 
rate 1/2 constituent codes are given by go and g\ in Table 5, as was also reported in [5]. 


Table 5. Best rate 1/3 constituent codes. 


k 

Code generator 

d 2 

^3 

d>min 

2 

<7o = 3 

9l = 2 

92 = 1 

4 

oo 

4 

3 

o 

II 

-4 

g\ = 5 

CO 

II 

cs 

cn 

8 

7 

7 

4 

go = 13 

91 = 17 

92 = 15 

14 

10 

10 

5 

90 = 23 

gi = 33 

CO 

II 

CM 

22 

12 

10 


<7o = 23 

91 = 25 

92 = 37 

22 

11 

11 
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Table 6. Best rate 1/4 constituent codes. 


k 


Code generator 


d-2 

d3 

dmin 

4 

9o = 13 

91 = 17 

92 = 15 

93 = 11 

20 

12 

12 

5 

go = 23 

9 i = 35 

92 = 27 

g 3 =37 

32 

16 

14 


go = 23 

9 \ = 33 

92=27 

93 = 37 

32 

16 

14 


go = 23 

9 i = 35 

92 = 33 

93 =37 

32 

16 

14 


go - 23 

9\ - 33 

92 =37 

93 = 25 

32 

15 

15 


E. Recursive Systematic Convolutional Codes With a Nonprimitive Feedback Polynomial 

So far, we assumed that the feedback polynomial for recursive systematic convolutional code is a 
primitive polynomial. We could ask whether it is possible to exceed the upper bound given in Theorem 1 
and Corollary 1 by using a nonprimitive polynomial. The answer is negative, thanks to a new theorem 
by Solomon W. Golomb (Appendix). 

Theorem 4. 1 For any rate 1/n linear recursive systematic convolutional code generated by a non- 
primitive feedback polynomial, the upper bound in Theorem 3 cannot be achieved, i.e., 


rig < (n- l)(2 m “ 1 +2) 


Proof. Using the results of Golomb (see the Appendix) for a nonprimitive feedback polynomial, there 
are more than two cycles (zero-input loops) in LFSR. The “zero cycle” has weight zero, and the weights 
of other cycles are nonzero. Thus, the weight of each cycle due to the results of the Appendix is strictly 
less than (n - l)2 m_1 . If we enter from the all-zero state with input weight-1 to one of the cycles of the 
shift register, then we have to leave the same cycle to the all-zero state with input weight- 1, as discussed 
in Theorem 1. Thus, d? < (n — l)(2 m “ 1 4- 2). Cl 

Theorem 5. For any rate b/b + 1 linear recursive systematic convolutional code generated by a 
nonprimitive feedback polynomial, the upper bound in Theorem 1 cannot be exceeded, i.e., 


om— 1 

dS < L-yJ + 2 

Proof. Again using the results of the Appendix, there is a “zero cycle” with weight zero and at least 
two cycles with nonzero weights, say q cycles with weights uq, w 2 , ■ * • , w q . The sum of the weights of all 
cycles is exactly 2 m_1 , i.e., w i — 2 m_1 . For a b/b + 1 code, we have b weight-1 symbols. Suppose that 
with b, of these weight-1 symbols we enter from the all-zero state to the ith cycle with weight then we 
have to leave the same cycle to the all-zero state with the same 6* symbols for i = 1, 2, ■ ■ ■ , <7, such that 
b L — b. Based on the discussion in the proof of Theorem 1, the largest achievable minimum output 
weight of codewords corresponding to weight- 2 sequences is mm(wi/b\, '^2/^2 > * * * , w q /b (I ) + 2. But it is 
easy to show that min('uq//>i, W2/b 2 , ■ ■ ■ , Wq/b q ) < Wi/Yl M = 2 m ~ l /b. A 


1 The proofs of Theorems 4 and 5 are based on a result by S. W. Golomb (see the Appendix), University of Southern 
California, Los Angeles, California, 1995. Theorem 4 and Corollary 2 were proved for more general cases when the 
code is generated by multiple LFSRs by R. J. McKliece, Communications Systems and Research Section, Jet Propulsion 
Laboratory, Pasadena, California, and California Institute of Technology, Pasadena, California, 1995, using a state-space 
approach. 
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Corollary 2. For any rate b/n linear recursive systematic convolutional code generated by a non- 
primitive feedback polynomial, the upper bound in Corollary 1 cannot be exceeded. 

Proof. The proof is similar to the Proof of Theorem 5, but now w i = ( n ~ b)2 rn ~ 1 . □ 


IV. Turbo Decoding for Multiple Codes 

In [9] we described a new turbo decoding scheme for q codes based on approximating the optimum 
bit decision rule. The scheme is based on solving a set of nonlinear equations given by (q = 3 is used to 
illustrate the concept) 


L 0 k 


= 2py 0 /c 


. = 1qct Eu:u,=i f(yi M II &k e ttj ^ i>,+ ^ 2;,+ ^ 3,) 

lfc ° S E u: u t =o ^(yil«) 

f , E u; u fc =i p (y2l u )n k e u ’( L '» +L 'j +L3 ^ 

Eu:ut =0 p ( y 2 |u) U^ k e^+^+L 3i ) 

£ = lo _ Eu^-1 p (y3l u ) ru 

^ " OS Eu:„ )k =0 P (y3|u)n j#fc ^(^+^+^) 

for k = 1, 2, * • • , N. In Eq. (14), Z^- represents extrinsic information and y*, i = 0, 1, 2, 3 ar e the received 
observation vectors corresponding to x*, i = 0, 1,2,3 (see Fig. 1), where p — y^rE^/TVo, if we assume 
the channel noise samples have unit variance per dimension. The final decision is then based on 


Lk = L ok + L ifc -F L2/C + L%k 


(15) 


which is passed through a hard limiter with a zero threshold. 

The above set of nonlinear equations is derived from the optimum bit decision rule, i.e., 


Lk = log 


Eu:u t =i p (yol u ) p (yil u ) p (y2l u ) p (y3|u) 

Eu:u k =o p (yo! u ) p (yi! ,, ) p (y2!u) p (y 3 !u) 


(16) 


using the following approximation: 


p (u|yi) ~ n 7 

k = 1 1 


N e u k Ljk 


+ e* 


(17) 


Note that, in general, P(u|y*) is not separable. The smaller the Kullback cross entropy [3,17] between 
right and left distributions in Eq. (17), the better is the approximation and, consequently, the closer is 
turbo decoding to the optimum bit decision. 
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We attempted to solve the nonlinear equations in Eq. (14) for L 1? L 2 , and L 3 by using the iterative 
procedure 


f(m+ 1 ) 
L \k 


= a[ m ^ log 


Eu:u t = l ^(yilu) n^ fc 


(18) 


for k = 1 , 2, • • - , N, iterating on m. Similar recursions hold for and L^\ The gain a ^ should 
be equal to one, but we noticed experimentally that better convergence can be obtained by optimizing 
this gain for each iteration, starting from a value less than 1 and increasing toward 1 with the iterations, 
as is often done in simulated annealing methods. We start the recursion with the initial condition 2 
= Lo- For the computation of Eq. (18), we use a modified MAP algorithm 3 with 
permuters (direct and inverse) where needed, as shown in Fig. 4. The MAP algorithm [ 1 ] always starts 
and ends at the all-zero state since we always terminate the trellis as described in [ 8 ]. We assumed 7T\ = I 
(identity); however, any 7 Ti can be used. The overall decoder is composed of block decoders connected 
as in Fig. 4, which can be implemented as a pipeline or by feedback. In [10] and [ 11 ], we proposed an 
alternative version of the above decoder that is more appropriate for use in turbo trellis-coded modulation, 
i.e., set Lo = 0 and consider yo as part of yi. If the systematic bits are distributed among encoders, we 
use the same distribution for y 0 among the MAP decoders. 



2 Note that the components of the Lj’s corresponding to the tail bits, i.e., for k — N + 1, ■ • • , N 4- Mi, are set to zero 
for all iterations. 

3 The modified MAP algorithm is described in S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Soft-Output 
Decoding Algorithms in Iterative Decoding of Parallel Concatenated Convolutional Codes,” submitted to ICC ’ 96 . 


110 



















At this point, further approximation for turbo decoding is possible if one term corresponding to a 
sequence u dominates other terms in the summation in the numerator and denominator of Eq. (18). 
Then the summations in Eq. (18) can be replaced by “maximum” operations with the same indices, i.e., 
replacing ^2 u:Uk=i with U '™V for i = 0 , 1 . A similar approximation can be used for and L^k in 
Eq. (14). This suboptimum decoder then corresponds to a turbo decoder that uses soft output Viterbi 
(SOVA)-type decoders rather than MAP decoders. Further approximations, i.e., replacing £ with max, 
can also be used in the MAP algorithm . 4 

A. Decoding Multiple Input Convolutional Codes 

If the rate b/n constituent code is not equivalent to a punctured rate 1 /n' code or if turbo trellis-coded 
modulation is used, we can first use the symbol MAP algorithm 5 to compute the log-likelihood ratio of 
a symbol u = U\,U 2 , ■ ■ ■ ,Ub given the observation y as 


A(u) = log 


Pi u|y) 
P( 0|y) 


where 0 corresponds to the all-zero symbol. Then we obtain the log-likelihood ratios of the jth bit within 
the symbol by 


L{uj) = log 


V e A(u ) 

Z^u:u, = l c 

y „e A < u > 


In this way, the turbo decoder operates on bits, and bit, rather than symbol, interleaving is used. 


V. Performance and Simulation Results 

The BER performance of these codes was evaluated by using transfer function bounds [4,6,12]. In [ 12 ], 
it was shown that transfer function bounds are very useful for SNR s above the cutoff rate threshold and 
that they cannot accurately predict performance in the region between cutoff rate and capacity. In this 
region, the performance was computed by simulation. 


Figure 5 shows the performance of turbo codes with m iterations and an interleaver size of N = 16, 384. 
The following codes are used as examples: 


( 1 ) Rate 1/2 Turbo Codes. 

Code A: Two 16-state, rate 2/3 constituent codes are used to construct a rate 1/2 turbo 
code as shown in Fig. 6 . The (worst-case) minimum codeword weights, corresponding 
to a weight-z input sequence for this code are d e j~ 14, ^ 3 = 7 , c? 4 = 8 , d 5 =5=d m ; n , and 
^ 6 — 6 . 


4 Ibid. 

5 Ibid. 
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Code B: A rate 1/2 turbo code also was constructed by using a differential encoder and a 
32-state, rate 1/2 code, as shown in Fig. 7. This is an example where the systematic bits 
for both encoders are not transmitted. The (worst-case) minimum codeword weights, 
corresponding to a weight-i input sequence for this code are d e f= 19, d 4 =6=d 7mn , de=9, 
d$= 8, and dio — 11- The output weights for odd i are large. 

(2) Rate 1/3 Turbo Code. 

Code C: Two 16-state, rate 1/2 constituent codes are used to construct a rate 1/3 turbo 
code as shown in Fig. 8. The (worst-case) minimum codeword weights, d*, corresponding 
to a weight-z input sequence for this code are d e f— 22, d 3 = 11, d 4 =12, d 5 = 9 = d mm > 
d 6 = 14, and d 7 =15. 

(3) Rate 1/4 Turbo Code. 

Code D: Two 16-state, rate 1/2 and rate 1/3 constituent codes are used to construct 
a rate 1/4 turbo code, as shown in Fig. 9, with d e f = 32, d 3 = 15 = d min , d 4 = 16, 
d 5 = 17, do — 16, and d 7 = 19. 

(4) Rate 1/15 Turbo Code. 

Code E: Two 16-state, rate 1/8 constituent codes are used to construct a rate 1/15 
turbo code, [l, g\/ 90 , 92 / 9o, 9s/ 90 , 9 a/ 90 , 95 / 9o,9o/ 9o,9i / 9o) and {g\/ 90 , 92 / 9o, 93 / 90 , 9a/ 
9o,9s/9o, 90 / 90 , 97 / 90 ), with go — (23) oc£a /, 9\ = (21 ) oc * a / , 92 — (25 ) oc tai, 9s — (27 ) oc tai, 
94 = (31 )octai, 95 - (33) octa/ , go = (35) octa *, and g 7 = (37) oc£a/ . The (worst-case) 
minimum codeword weights, d t , corresponding to a weight i input sequence for this code 
are d e /=142, d 3 =39=d m * n , d 4 =48, d 5 = 45, d 6 = 50, and d 7 =6 3. 

The simulation performance of other codes reported in this article is still in progress. 
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Fig. 6. Rate 1/2 turbo code constructed from two codes (/jq = 23, /7-j = 35, h 2 = 33). 



Fig. 7. Rate 1/2 turbo code constructed from a differential encoder and code 
(90 = 67, 9 1 = 73). 


VI. Turbo Trellis-Coded Modulation 

A pragmatic approach for turbo codes with multilevel modulation was proposed in [16]. Here we 
propose a different approach that outperforms the results in [16] when M-ary quadrature amplitude 
modulation (M-QAM) or M-ary phase shift keying (MPSK) modulation is used. A straightforward 
method for the use of turbo codes for multilevel modulation is first to select a rate b/(b + 1) constituent 
code, where the outputs are mapped to a 2 6+1 -level modulation based on Ungerboeck’s set partitioning 
method [21] (i.e., we can use Ungerboeck’s codes with feedback). If MPSK modulation is used, for every b 
bits at the input of the turbo encoder, we transmit two consecutive 2 b+1 phase-shift keying (PSK) signals, 
one per each encoder output. This results in a throughput of b/2 bits/s/Hz. If M-QAM modulation is 
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INPUT DATA 



Fig. 8. Rate 1/3 turbo code constructed from two identical codes 
(90 = 23, flf! = 33). 


INPUT DATA 



Fig. 9. Rate 1/4 turbo code constructed from two codes 
(SO = 23, Si = 33) and (go = 23, g<\ = 37, g^ - 25). 


used, we map the 6+1 outputs of the first component code to the 2 b+1 in-phase levels (I-channel) of a 
2" b+2 -QAM signal set and the 6+1 outputs of the second component code to the 2 b+l quadrature levels 
(Q-channel). The throughput of this system is 6 bits/s/Hz. 

First, we note that these methods require more levels of modulation than conventional trellis-coded 
modulation (TCM), which is not desirable in practice. Second, the input information sequences are used 
twice in the output modulation symbols, which also is not desirable. An obvious remedy is to puncture 
the output symbols of each trellis code and select the puncturing pattern such that the output symbols 
of the turbo code contain the input information only once. If the output symbols of the first encoder are 
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punctured, for example as 101010 • • •, the puncturing pattern of the second encoder must be nonuniform 
to guarantee that all information symbols are used, and it depends on the particular choice of interleaver. 
Now, for example, for 2 b+1 PSK, a throughput 6 can be achieved. This method has two drawbacks: It 
complicates the encoder and decoder, and the reliability of punctured symbols may not be fully estimated 
at the decoder. A better remedy, for rate 6/(6 + 1) (6 even) codes, is discussed in the next section. 


A. A New Method to Construct Turbo TCM 

For a q = 2 turbo code with rate 6/(6+ 1) constituent encoders, select the 6/2 systematic outputs and 
puncture the rest of the systematic outputs, but keep the parity bit of the 6/(6 + 1) code (note that the 
rate 6/(6 + 1) code may have been obtained already by puncturing a rate 1/2 code). Then do the same 
to the second constituent code, but select only those systematic bits that were punctured in the first 
encoder. This method requires at least two interleavers: The first interleaver permutes the bits selected 
by the first encoder and the second interleaver those punctured by the first encoder. For MPSK (or 
M-QAM), we can use 2 1+6/2 PSK symbols (or 2 1+6/2 QAM symbols) per encoder and achieve throughput 
6/2. For M-QAM, we can also use 2 1+6/2 levels in the I-channel and 2 1+6/2 levels in the Q-channel and 
achieve a throughput of 6 bits/s/Hz. These methods are equivalent to a multidimensional trellis-coded 
modulation scheme (in this case, two multilevel symbols per branch) that uses 2 6/2 x 2 1+6/2 symbols per 
branch, where the first symbol in the branch (which depends only on uncoded information) is punctured. 
Now, with these methods, the reliability of the punctured symbols can be fully estimated at the decoder. 
Obviously, the constituent codes for a given modulation should be redesigned based on the Euclidean 
distance. In this article, we give an example for 6 = 2 with 16-QAM modulation where, for simplicity, 
we can use the 2/3 codes in Table 1 with Gray code mapping. Note that this may result in suboptimum 
constituent codes for multilevel modulation. The turbo encoder with 16 QAM and two clock-cycle trellis 
termination is shown in Fig. 10. The BER performance of this code with the turbo decoding structure 
for two codes discussed in Section IV is given in Fig. 11. For permutations tt\ and 7 r 2 , we used S-random 
permutations [9] with S = 40 and S = 32, with a block size of 16,384 bits. For 8 PSK, we used two 
16-state, rate 4/5 codes given in Section V to achieve throughput 2. The parallel concatenated trellis 
codes with 8 PSK and two clock-cycle trellis termination is shown in Fig. 12. The BER performance of 
this code is given in Fig. 13. For 64 QAM, we used two 16-state, rate 4/5 codes given in Section V to 
achieve throughput 4. The parallel concatenated trellis codes with 64 QAM and two clock-cycle trellis 
termination is shown in Fig. 14. The BER performance of this code is given in Fig. 15. For permutations 
*1, 1 * 2 , 7T3, and 7T4 in Figs. 10, 12, and 14, we used random permutations, each with a block size of 4096 
bits. As was discussed above, there is no need to use four permutations; two permutations suffice, and 
they may even result in a better performance. Extension of the described method for construction of 
turbo TCM based on Euclidean distance is straightforward. 6 


VII. Conclusions 

In this article, we have shown that powerful turbo codes can be obtained if multiple constituent codes 
are used. We reviewed an iterative decoding method for multiple turbo codes by approximating the 
optimum bit decision rule. We obtained an upper bound on the effective free Euclidean distance of 6/n 
codes. We found the best rate 2/3, 3/4, 4/5, and 1/3 constituent codes that can be used in the design 
of multiple turbo codes. We proposed new schemes that can be used for power- and bandwidth-efficient 
turbo trellis-coded modulation. 


6 This is discussed in S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Parallel Concatenated Trellis Coded Modu- 
lation,” submitted to ICC ’96. 
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Fig. 12. Parallel concatenated trellis-coded modulation, 8 P5K, 2 bits/s/Hz. 



Fig. 13. BER performance of parallel con- 
catenated trellis-coded modulation, 8 PSK, 
2 bits/s/Hx. 
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Appendix 

A Bound on the Weights of Shift Register Cycles 1 


I. Introduction 

A maximum-length linear shift register sequence — a pseudonoise (PN)-sequence or a maximal length 
(m)-sequence — of degree m has period p = 2 m - 1, with 2 m_1 ones and 2 m ~ 1 - 1 zeroes in each period. 
Thus, the weight of a PN cycle is 2 m_1 . From a linear shift register whose characteristic polynomial is 
reducible, or irreducible but not primitive, in addition to the “zero-cycle” of period 1, there are several 
other possible cycles, depending on the initial state of the register, and each of these cycles has a period 
less than 2 m - 1. 

The question is whether it is possible for any cycle, from any linear shift register of degree m, to have 
a weight greater than 2 m_1 . We shall show that the answer is “no” and that this result does not depend 
on the shift register being linear. 

II. The Main Result 

Let S be any feedback shift register of length m, linear or not. We need not even specify that the 
shift register produce “pure” cycles, without branches. We will use only the fact that each state of the 
shift register has a unique successor state. For any given initial state, we define the length L of the string 
starting from that state to be the number of states, counting from the initial state, prior to the second 
appearance of any state in the string. (In the case of branchless cycles, this is the length of the cycle with 
the given initial state.) 

The string itself is this succession of states of length L. The corresponding string sequence is the 
sequence of 0’s and l’s appearing in the right-most position of the register (or any other specific position 
of the register that has been agreed upon) as the string goes through its succession of L states. 

Theorem 1 . From a feedback shift register 5 of length m, the maximum number of l’s that can 
appear in any string sequence is 2 m ” 1 . 

Proof. There are 2 m possible states of the shift register S altogether. In any fixed position of the shift 
register, 2 rn_1 of these states have a 0 and 2 m_1 states have a 1. In a string of length L, all L of the states 
are distinct, and in any given position of the register, neither 0 nor 1 can occur more than 2 m “ 1 times. 
In particular, the weight of a string sequence from a register of length m cannot exceed 2 m_1 . □ 

Corollary 1 . No cycle from a feedback shift register of length m can have weight exceeding 2 m_1 . 


1 S. W. Golomb, personal communication, University of Southern California, Los Angeles, California, 1995. 
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